Optimal Advice
نویسندگان
چکیده
Ko Ko83] proved that the P-selective sets are in the advice class P/quadratic. We prove that the P-selective sets are in NP=linear T coNP=linear. We show this to be optimal in terms of the amount of advice needed.
منابع مشابه
When Suboptimal Rules
This paper represents a paradigm shift in what advice agents should provide people. Contrary to what was previously thought, we empirically show that agents that dispense optimal advice will not necessary facilitate the best improvement in people’s strategies. Instead, we claim that agents should at times suboptimally advise. We provide results demonstrating the effectiveness of a suboptimal ad...
متن کاملUsing Advice in Model-Based Reinforcement Learning
When a human is mastering a new task, they are usually not limited to exploring the environment, but also avail themselves of advice from other people. In this paper, we consider the use of advice expressed in a formal language to guide exploration in a model-based reinforcement learning algorithm. In contrast to constraints, which can eliminate optimal policies if they are not sound, advice is...
متن کاملOptimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice
We prove non-asymptotic lower bounds on the expectation of the maximum of d independent Gaussian variables and the expectation of the maximum of d independent symmetric random walks. Both lower bounds recover the optimal leading constant in the limit. A simple application of the lower bound for random walks is an (asymptotically optimal) non-asymptotic lower bound on the minimax regret of onlin...
متن کاملCoordination Advice: A Preliminary Investigation of Human Advice to Multiagent Teams
This paper introduces a new area of advice that is specific to advising a multiagent team: Coordination Advice. Coordination Advice differs from traditional advice because it pertains to coordinated tasks and interactions between agents. Given a large multiagent team interacting in a dynamic domain, optimal coordination is a difficult challenge. Human advisors can improve such coordination via ...
متن کاملAn Optimal Algorithm for Linear Bandits
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order √ Td lnN on any finite class X ⊆ R of N actions, and of order d √ T (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 154 شماره
صفحات -
تاریخ انتشار 1996